Hybridization and Treebank Enrichment with Constraint-Based Representations
نویسندگان
چکیده
We present in this paper a method for hybridizing constituency treebanks with constraint-based descriptions and enrich them with an evaluation of sentence grammaticality. Such information is calculated thanks to a two-steps technique consisting in : (1) constraint grammar induction from the source treebank and (2) constraint evaluation for all sentences, on top of which a grammaticality index is calculated. This method is theoretically-neutral and language independent. Because of the precision of the encoded information, such enrichment is helpful in different perspectives, for example when designing psycholinguistics experiments such as comprehension or reading difficulty.
منابع مشابه
Optimization of Hybrid Composite Laminate Based on the Frequency using Imperialist Competitive Algorithm
Imperialist competitive algorithm (ICA) is a new socio-politically motivated global search strategy. The ICA is applied to hybrid composite laminates to obtain minimum weight and cost. The approach which is chosen for conducting the multi-objective optimization was the weighted sum method (WSM). The hybrid composite Laminates are made of glass/epoxy and carbon/epoxy to combine the lightness and...
متن کاملLinking Flat Predicate Argument Structures
This report presents an approach to enriching flat and robust predicate argument structures with more fine-grained semantic information, extracted from underspecified semantic representations and encoded in Minimal Recursion Semantics (MRS). Such representations are provided by a hand-built HPSG grammar with a wide linguistic coverage. A specific semantic representation, called linked predicate...
متن کاملAutomatic Morphological Enrichment of a Morphologically Underspecified Treebank
In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic anno...
متن کاملTreebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation
This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...
متن کاملAutomatic Annotation of the Penn-Treebank with LFG F-Structure Information
Lexical-Functional Grammar f-structures are abstract syntactic representations approximating basic predicate-argument structure. Treebanks annotated with f-structure information are required as training resources for stochastic versions of unification and constraint-based grammars and for the automatic extraction of such resources. In a number of papers (Frank, 2000; Sadler, van Genabith and Wa...
متن کامل